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Abstract 

I write out and discuss how one might try to prove the continuous 
time g-computation formula, in the simplest possible case: treatments 
(labelled a, for actions) and covariates {I: longitudinal data) form together 
a bivariate counting process. 



1 Introduction 

outlines a theory of causal inference for complex longitudinal 
data, when treatments can be administered and covariates observed, continu- 
ously in time. Th is the ory is supposed to extend the earlier work of iRobin'^ 
(Il98f)l [T987. .198d. Il997^ . devoted to the case in which covariates and treat- 
ments take values in discret e spac es, and time advances in discrete time steps. 
Already in iGill and Robing l)200lj) . we managed to extend the theory to con- 
tinuously distributed covariates and treatments. In this note, we address the 
generalization to continuous ti me. The rnaior p art of this research programme 
has already been carried out bv lLokl t 200l[l20M) . It is an open problem to com- 



plete that project with a continuous time version of the g-eomputation formula 
and the theorems centered around it. The formula tells one how to write down 
the probability distribution of an outcome of interest, in the counterfactual sit- 
uation that a prechosen treatment plan g had been adhered to, rather than the 
fac tual c ase th at treatment was assigned haphazardly. 

ILo3 ll200j) m anage s to develop a martingale and counting process based 
theory of lRobinsI ' lll997l) statistical models, estimators and tests, without having 
recourse to the (/-computation formula. So is it so central to the theory, after 
all? The answer is that without the formula, the statistical methodology lacks 
motivation. In particular, one needs the formula in order to show that the test 



statistics of lLokl 1)200 1|) really do test the null hypothesis of no treatment effect. 
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in the sense that the counterfactual outcome under all treatment plans g has 
exactly the same probability distribution. 

Below we do not succeed in proving the formula, nor establishing the wished- 
for results which should follow from it. What we do do, is present a framework in 
which these questions can hopefully be studied, and in particular, write down a 
conjectural g-computation formula and the assumptions under which it is likely 
to be true. 

2 The model 

Suppose that as a patient is followed in time, longitudinal data is gathered and 
treatment decisions or actions are taken; both continuously in time. The most 
simple possible of scenarios, is that there is only one kind of action. The only 
variation in treatment is in the times at which the action is taken, the nature of 
the actions at different times is irrelevant or always the same; similarly, incoming 
data takes the form of a sequence of events at random time points, and the only 
relevant thing is the time of the events, not their nature. Finally we suppose 
that actions and longitudinal data events are never simultaneous. The pair of 
point processes therefore forms a bivariate counting process (N°, N'); or if you 
prefer, a single marked point process /x with a mark space X = {a, I}, say, and 
component point processes /j,"^, /u.'; or if you prefer, two sequences of random 
positive time points with no ties between them, {0 < < T^, . . .), (0 < < 
Tj, . . .). Ordinary random variables are set in plain lettertype, random processes 
and random measures in bold. We suppose time varies through a bounded time 
interval T = [0, r] and that the total number of events of both types is finite 
with probability 1. Recall that a marked point process is a random measure 
assigning mass 1 to random ordered pairs of a timepoint and accompanying 
mark, while a counting process counts numbers of events, of each kind, up to 
each timepoint. We suppose there is no event at time zero. The relations 
between these quantities are: fi = J2jhT°,a) + J2k^{Tl,i) where S(t^x) the 
measure with point mass 1 at the point {t,x) G T x Af; At^(i?) = n{B x {x}) 
for each Borel set in T and each mark x = a,l G X; N^(t) = /i^([0,t]) for each 
X G X. 

We suppose that we have access to unlimited observational data, and there- 
fore essentially know the probabilility distribution, for a randomly chosen pa- 
tient, of the just introduced random quantities. The probability law can be 
recovered from the cumulative intensity process or compensator A of the count- 
ing process N or, if you prefer, the dual predictable projection or compensator 
1/ of the marked point process fi. Let /U (plain lettertype) denote a possible 
realization of the random point process pi (bold). Write /Zt for the restriction 
of the measure /i to [0, t] x X. Then for each history of the point process up to 
the time of an event, thus for each fj,t for which there is an event at timepoint 
t, we have two conditional hazard measures |t, /if) on {t,T], x = a,l, such 
that the conditional probability that the first event of fi after t is in the time 
interval ds and has mark equal to x , given the history up to and including 
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time t, is z/^(ds) for s G {t,T] and x = a, I. The two conditional hazard mea- 
sures have no atoms in common, since we assumed there are no simultaneous 
events. The dual predictable projection of ju is the random measure defined 
by i>'(d,s, dx) = i^^(d,s ] t, /i/) on the event where t is the time of the last event 
of n strictly before time s. The cumulative intensity process A is defined by 
A^(s) = i/((0,s] X {x}) for all s and x. Thus A^(ds) = i^^(ds) = i/^(ds 1 /i*) 
where t is as before. 

One can generate the whole process by drawing subsequent time points and 
marks using the two conditional hazard measures, given any history of events 
up to the jth event at some time point t, to generate the time and mark of the 
J + 1st event. 

3 Treatment plans 

A treatment plan g consists of subplans, one for each j and = < < 4 < 
... <tj, which prescribes subsequent action timepoints, from time t'j onwards, 
so long as no further longitudinal data timepoint intervenes. Wc may therefore 
further split the subplans into sub-subplans, one for each j and each k, which 
prescribe the time of the fcth action timepoint after the jth longitudinal data 
timepoint, so long as no new longitudinal data timepoint occurs. The moment 
there is a new longitudinal data timepoint, the old subplan (or subsubplan), is 
discarded in favour of the relevant new subplan. Each subplan "assumes" that 
the overall plan g has been adhered to in previous segments of the history, so 
each subplan "knows" all the preceding, planned, action timepoints as well as 
the given preceding longitudinal data timepoints. Thus, if we are adhering to 
a particular plan g, we can for any sequence of longitudinal data timepoints 
to = < 4 < 4 < • • •) thus for any outcome /x', write down the complete 
accompanying sequence of planned action timepoints, and thereby reconstruct 
a complete outcome of a marked point process /i^ given the component marked 
point process outcome fj}. Moreover this can be done in an adaptive way: 
/if = At®|(o,f] is a function of /u' = /i'|(o,i]) and of course of the specific treatment 
plan g under consideration. We can therefore also compute, in an adaptive 
way, an outcome A^ = (A^'", A"'') of the cumulative intensity process A, or 
an outcome = (z/^'", z/^'') of the dual predictable projection i/, through its 
dependence on /i = /i^, as a function of any sequence of longitudinal data 
timepoints 4 = < 4 < 4 < • • •> i-^-; as a function of /x'. 

4 g- Computation Formula 

Suppose a plan g is given. Suppose moreover is given, a random variable Y, 
taking values in some Polish space, which we consider as the outcome of inter- 
est. Alongside the "factual" outcome Y we suppose there is also defined the 
"counterfactual" outcome Y^: the outcome which would have pertained, had 
plan g been adhered to. Now the conditional law of Y given can be con- 



3 



sidered as a function of fi, as such we denote it as Law(y|/^t = ^). Therefore, 
for a given sequence of longitudinal data timepoints = < < < . . . , 
which determines a possible outcome of , we can evaluate the law of Y given 
fM at fi — = = (/z', 5)). The ^-computation formula, which 

we want to prove under versions of the usual three assumptions of consistency, 
no-unmeasured confounding, and evaluability, is the following: 

Law(y^) = E /•••/ 

^ t{<...<t\^<T 

n 

HTT (l-A^''{ds))A^^\dti)Tf (l-AS''(ds))Law(FI/x = /iS). 

1=1 J ^se{t\_-^,t\) J ^se{t'^,T] 

The first thing to note about this formula is that it is a functional of the cu- 
mulative intensity function A^'' and of the conditional law of given /jl, both 
considered as functionals of /x^, which again is a functional of the chosen treat- 
ment plan g and the summation and integration variables in the formula: the 
total number n of longitudinal data timepoints in the time interval T and their 
values = < t{ < ... < tl^ < r. These variables precisely determine an 
outcome of fjJ. The cumulative intensity function A^'' is computed from the 
conditional probability laws of the 'next longitudinal data timepoint' restricted 
to the event, that it precedes the next action timepoint, given the history of 
the process /i up to the times of the zero'th, first, second . . . events. Thus it 
depends on whi ch version is chosen of e ach of these conditional probability laws. 

Recall from lGill and RobinsI l|200lh that there are two issues in establishing 
this formula. The first is the question whether, when one chooses appropriate 
versions of the conditional distributions involved, it gives the right answer. The 
second question is whether, when conditional distributions are chosen, if pos- 
sible, in some canonical fashion, the result is uniquely defined as a functional 
of the joint law of the data /x, Y. We may have to face up to one third, more 
technical issue: the formula supposes that in the counterfactual world where 
treatment plan g is followed, there is no explosion in the sequence of timepoints 
of events; in other words, if we replace the conditional law of Y in the integrand 
with the constant function 1 , the result of the g-computation formula should be 
the total probability 1. Let us call this condition, the no-cxplosion condition for 
plan g. 

Now we discuss what the three usual conditions should look like, in this con- 
text, and make some remarks on how one might attempt to prove the formula. 

The consistency condition, in a sufficient and weaker 'in law' form, should 
naturally be: Law(y|/x = fi) = haw(Y^\fi = 11) for outcomes fi consistent with 
plan g: thus, outcomes /x such that fi'^ — fi°'{^\g). The 'no unmeasured con- 
founders' assumption should be that the intensity process of the action events, 
when the history of the process fj, is augmented by taking Y^ to be a random 
variable realized at time t — 0, should be the same as the intensity process of 
the action events when only the history of fi is taken into account, for outcomes 
fi consistent with plan g. In terms of conditional distributions, it is the assump- 
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tion that conditional on the times and types of events up to any number of the 
events, is independent of the time to the next action event, restricted to the 
event that it precedes the next longitudinal data event; and we only need to 
check this condition for outcomes /i consistent with plan g. Just the consistency 
and the no unmeasured confounders assumptions should be sufficient to estab- 
lish the correctness of the ^-computation formula, when the same conditional 
distributions are employed in the formula, as arc involved in the assiimptions. 
Since typically the probability that /Lt is consistent with g is zero, this result has 
no empirical content. Still, given versions of all involved conditional distribu- 
tions, the result is not obviously true, so does have mathematical content. The 
first step in the proof is naturally to replace Y with Y^ on the right hand side 
of the formula, using the consistency assumption. How to proceed from here, is 
not so clear. A strategy which might work, is to consider the right hand side of 
the (/-computation formula, with Y replaced by Y^ and r replaced by a variable 
timepoint ct G T as a function of a, say b{a), and show that it satisfies some 
integral equation. We are given the value of the function b at a = t. If one can 
show the integral equation is uniquely solved by a constant function b* satisfing 
6*(0) = Law(F^), we are done. The non-explosion condition will presumably be 
needed in this analysis. The important step is guess a non-trivial probabilistic 
interpretation of b{a), and take the guess to define a function b*{a). Next, use 
the probabilistic interpretation to write informally a relation between b* (a + ds) 
and b*{(7), as an expectation of the possible outcomes in the time interval ds. 
Use probability theory to convert this to a rigorous relation in integral form. 

Informally, the proof should parallel that in the discrete time case and cor- 
respond to the remark that the law of Y^ given /icr+dcr docs not depend on 
/^""{da). Therefore, in order to recover the law of Y^ given fia- by averaging over 
the conditional law of the events of fj, in the time interval dtr given the events in 
the past, we need only average over the conditional law of the longitudinal data 
events. But whether or not there is a longitudinal data timepoint in this small 
time interval is a BernouUi {A\da)) variable. Thus Law(ys|/u<,) is a Bernoulli 
(A'(df7)) mixture of the two distributions Law(ys|/io-+d(T) with /i'(d(T) = 0, 1. 

Another possible ingredient is yielded by the remark that the law of Y^ 
given /LZ( is a martingale in t with respect to the history of /ix, and hence can 
be written as a stochastic integral with respect to fi — u. The representation 
involves the intensities of /u with respect to its own history, and with respect to 
the augmented history when Y^ is realized at time 0. 

In order to obtain a result with empirical content, we have to show how the 
formula can be uniquely evaluated, under further assumptions, from the joint 
law of Y and /x. A natural assumption which guarantees a canonical choice of 
conditional laws is continuity: we should assume that versions of all the condi- 
tional laws involved in the g computation formula, can be chosen so as to be 
continuous on the support of the conditioning variables. The conditioning vari- 
ables are partial histories of fi up to the so-manyth event, and the total history 
of IJ, on T. Continuity of probability laws is in the sense of weak convergence, 
and the partial and total histories of /j, are given their natural topologies. The 
conditional laws now have canonical versions on the supports of the condition- 
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ing variables, and wc should make the evaluability condition on the plan g that 
for partial histories in the support of the corresponding partial history of /ii, 
the next planned action time (restricted to the event where it precedes the next 
longitudinal data timcpoint) lies in the support of the conditional distribution 
of that time given the partial history so far. 
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